Algorithms in Neuroscience
The goal of this lecture is to learn how to use computational principals to understand natural phenomena.
Biological systems, of all scales, have evolved to solve the challenges posed by their environments inorder to give themselves the best chance of survival and reproduction. Algorithms are processes or sets of rules used to solve classes of problems, which makes them a powerful tool to systemitize and describe these natural strategies.
Marr's Levels of Analysis
Computational: what does the system do, and why does it do these things
Algorithmic/representational level: how does the system do what it does
Implementational/physical level: how is the system physically realized
Reinforcement Learning
Reinforcement Learning (RL) is the process by which an agent learns to act in its environment such that it maximizes its cummulative reward.
Understanding RL is a fundamental challenge for neuroscience because it underpins how the brain learns to behave!
Challenges Facing RL
Curse of Dimensionality i.e. the world is way too complex brute force for trial and error
Temporal Credit Assignment
Imperfect Information
Exploration-Exploitation Dilemma
...
Temporal Credit Assignment
Using Marr's framework what Computational problem must RL solve?
An agent must bind rewards to temporally distal prior states and actions.
We must introduce some symbols to describe this problem algebraically
Let
That is the value of the current state is the discounted sum of all the expected future rewards.
To derive an Algorithm to learn
We can now recursively define the function as
This recursive formulation gives us the simple learning rule
where
This Temporal-Difference (TD) Learning!
TD Learning in the Brain
How might TD learning be implemented in the brain?
Dopaminergic activity in the SNc and VTA has been shown to indicate reward prediction error which could provide a substrate for neural TD learning.
Exploration-Exploitation Dilemma
To find the most optimal policy, agents must thoroughly search the space of possible states. However, practical contstraints require the agent to balance this exploration against prioritize known, high value states.
A common solution is to randomly sample states in porportion to their known value. One of the simplest sampling algorithms is called
TD Learning Example
Below is an implementation of TD learning to find the shortest path between two nodes on a graph. The Agent begins at the Start node, marked in red, and travels along the edges, from node to node, until it reaches the Terminal node, marked in blue, and recieves a reward. After each move the Agent updates its estimates for the value of each node which corresponds to how many moves it is away from the reward.
Graph Size:
Start:
lamda:
0.76
0.2
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
xxxxxxxxxxbegin T=Tracker(G,A) run!(T)endRun Sim:
Defines Graph Maze
plot (generic function with 1 method)Defines Agent
reset! (generic function with 2 methods)Implements
choose_next_move (generic function with 1 method)Implements TD Learning
update! (generic function with 1 method)step! (generic function with 1 method)plot (generic function with 2 methods)run! (generic function with 2 methods)gen_adj_mat (generic function with 1 method)remove_self_loops! (generic function with 1 method)